AITopics | dall-e 3

Collaborating Authors

dall-e 3

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

Neural Information Processing SystemsFeb-18-2026, 07:42:43 GMT

Large language models (LLMs) based on decoder-only transformers have demonstrated superior text understanding capabilities compared to CLIP and T5-series models. However, the paradigm for utilizing current advanced LLMs in text-to-image diffusion models remains to be explored. We observed an unusual phenomenon: directly using a large language model as the prompt encoder significantly degrades the prompt-following ability in image generation.

diffusion model, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

d68c1d10957c8d21ed9dea209533c5a4-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 18:02:35 GMT

arxiv preprint arxiv, diffusion model, llm, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Beyond Content: How Grammatical Gender Shapes Visual Representation in Text-to-Image Models

Saeed, Muhammed, Raza, Shaina, Vayani, Ashmal, Abdul-Mageed, Muhammad, Emami, Ali, Shehata, Shady

arXiv.org Artificial IntelligenceOct-2-2025

Research on bias in Text-to-Image (T2I) models has primarily focused on demographic representation and stereotypical attributes, overlooking a fundamental question: how does grammatical gender influence visual representation across languages? We introduce a cross-linguistic benchmark examining words where grammatical gender contradicts stereotypical gender associations (e.g., ``une sentinelle'' - grammatically feminine in French but referring to the stereotypically masculine concept ``guard''). Our dataset spans five gendered languages (French, Spanish, German, Italian, Russian) and two gender-neutral control languages (English, Chinese), comprising 800 unique prompts that generated 28,800 images across three state-of-the-art T2I models. Our analysis reveals that grammatical gender dramatically influences image generation: masculine grammatical markers increase male representation to 73% on average (compared to 22% with gender-neutral English), while feminine grammatical markers increase female representation to 38% (compared to 28% in English). These effects vary systematically by language resource availability and model architecture, with high-resource languages showing stronger effects. Our findings establish that language structure itself, not just content, shapes AI-generated visual outputs, introducing a new dimension for understanding bias and fairness in multilingual, multimodal systems.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.03199

Country:

North America > United States (0.67)
Asia (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.49)

Add feedback

Where's the liability in the Generative Era? Recovery-based Black-Box Detection of AI-Generated Content

Bai, Haoyue, Sun, Yiyou, Cheng, Wei, Chen, Haifeng

arXiv.org Artificial IntelligenceAug-26-2025

The recent proliferation of photorealistic images created by generative models has sparked both excitement and concern, as these images are increasingly indistinguishable from real ones to the human eye. While offering new creative and commercial possibilities, the potential for misuse, such as in misinformation and fraud, highlights the need for effective detection methods. Current detection approaches often rely on access to model weights or require extensive collections of real image datasets, limiting their scalability and practical application in real-world scenarios. In this work, we introduce a novel black-box detection framework that requires only API access, sidestepping the need for model weights or large auxiliary datasets. Our approach leverages a corrupt-and-recover strategy: by masking part of an image and assessing the model's ability to reconstruct it, we measure the likelihood that the image was generated by the model itself. F or black-box models that do not support masked-image inputs, we incorporate a cost-efficient surrogate model trained to align with the target model's distribution, enhancing detection capability. Our framework demonstrates strong performance, outperforming baseline methods by 4.31% in mean average precision across eight diffusion model variant datasets. Code is publicly available at https://github.com/

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.01008

Country:

North America > United States (0.46)
Asia (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Transportation > Air (0.83)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.32)

Add feedback

Test-time Prompt Refinement for Text-to-Image Models

Khan, Mohammad Abdul Hafeez, Jain, Yash, Bhattacharyya, Siddhartha, Vineet, Vibhav

arXiv.org Artificial IntelligenceJul-31-2025

T ext-to-image (T2I) generation models have made significant strides but still struggle with prompt sensitivity: even minor changes in prompt wording can yield inconsistent or inaccurate outputs. T o address this challenge, we introduce a closed-loop, test-time prompt refinement framework that requires no additional training of the underlying T2I model, termed TIR . In our approach, each generation step is followed by a refinement step, where a pretrained multimodal large language model (MLLM) analyzes the output image and the user's prompt. The MLLM detects misalignments (e.g., missing objects, incorrect attributes) and produces a refined and physically grounded prompt for the next round of image generation. By iteratively refining the prompt and verifying alignment between the prompt and the image, TIR corrects errors, mirroring the iterative refinement process of human artists. W e demonstrate that this closed-loop strategy improves alignment and visual coherence across multiple benchmark datasets, all while maintaining plug-and-play integration with black-box T2I models.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2507.22076

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Using complex prompts to identify fine-grained biases in image generation through ChatGPT-4o

Ferreira, Marinus

arXiv.org Artificial IntelligenceMar-31-2025

There are not one but two dimensions of bias that can be revealed through the study of large AI models: not only bias in training data or the products of an AI, but also bias in society, such as disparity in employment or health outcomes between different demographic groups. Often training data and AI output is biased for or against certain demographics (i.e. older white people are overrepresented in image datasets), but sometimes large AI models accurately illustrate biases in the real world (i.e. young black men being disproportionately viewed as threatening). These social disparities often appear in image generation AI outputs in the form of 'marked' features, where some feature of an individual or setting is a social marker of disparity, and prompts both humans and AI systems to treat subjects that are marked in this way as exceptional and requiring special treatment. Generative AI has proven to be very sensitive to such marked features, to the extent of over-emphasising them and thus often exacerbating social biases. I briefly discuss how we can use complex prompts to image generation AI to investigate either dimension of bias, emphasising how we can probe the large language models underlying image generation AI through, for example, automated sentiment analysis of the text prompts used to generate images.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2504.00388

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Africa > Kenya > Nairobi City County > Nairobi (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.93)

Add feedback

Evaluating Compositional Scene Understanding in Multimodal Generative Models

Fu, Shuhao, Lee, Andrew Jun, Wang, Anna, Momennejad, Ida, Bihl, Trevor, Lu, Hongjing, Webb, Taylor W.

arXiv.org Artificial IntelligenceMar-29-2025

The visual world is fundamentally compositional. Visual scenes are defined by the composition of objects and their relations. Hence, it is essential for computer vision systems to reflect and exploit this compositionality to achieve robust and generalizable scene understanding. While major strides have been made toward the development of general-purpose, multimodal generative models, including both text-to-image models and multimodal vision-language models, it remains unclear whether these systems are capable of accurately generating and interpreting scenes involving the composition of multiple objects and relations. In this work, we present an evaluation of the compositional visual processing capabilities in the current generation of text-to-image (DALL-E 3) and multimodal vision-language models (GPT-4V, GPT-4o, Claude Sonnet 3.5, QWEN2-VL-72B, and InternVL2.5-38B), and compare the performance of these systems to human participants. The results suggest that these systems display some ability to solve compositional and relational tasks, showing notable improvements over the previous generation of multimodal models, but with performance nevertheless well below the level of human participants, particularly for more complex scenes involving many ($>5$) objects and multiple relations. These results highlight the need for further progress toward compositional understanding of visual scenes.

large language model, machine learning, relation, (18 more...)

arXiv.org Artificial Intelligence

2503.23125

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.57)

Add feedback

The erasure of intensive livestock farming in text-to-image generative AI

Sheng, Kehan, Tuyttens, Frank A. M., von Keyserlingk, Marina A. G.

arXiv.org Artificial IntelligenceMar-12-2025

Generative AI (e.g., ChatGPT) is increasingly integrated into people's daily lives. While it is known that AI perpetuates biases against marginalized human groups, their impact on non-human animals remains understudied. We found that ChatGPT's text-to-image model (DALL-E 3) introduces a strong bias toward romanticizing livestock farming as dairy cows on pasture and pigs rooting in mud. This bias remained when we requested realistic depictions and was only mitigated when the automatic prompt revision was inhibited. Most farmed animal in industrialized countries are reared indoors with limited space per animal, which fail to resonate with societal values. Inhibiting prompt revision resulted in images that more closely reflected modern farming practices; for example, cows housed indoors accessing feed through metal headlocks, and pigs behind metal railings on concrete floors in indoor facilities. While OpenAI introduced prompt revision to mitigate bias, in the case of farmed animal production systems, it paradoxically introduces a strong bias towards unrealistic farming practices.

farm, livestock farming, prompt revision, (14 more...)

arXiv.org Artificial Intelligence

2502.19771

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Europe > Germany (0.05)
Oceania > New Zealand (0.05)
(6 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.93)

Industry:

Food & Agriculture > Agriculture (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Un-Straightening Generative AI: How Queer Artists Surface and Challenge the Normativity of Generative AI Models

Taylor, Jordan, Mire, Joel, Spektor, Franchesca, DeVrio, Alicia, Sap, Maarten, Zhu, Haiyi, Fox, Sarah

arXiv.org Artificial IntelligenceMar-12-2025

Queer people are often discussed as targets of bias, harm, or discrimination in research on generative AI. However, the specific ways that queer people engage with generative AI, and thus possible uses that support queer people, have yet to be explored. We conducted a workshop study with 13 queer artists, during which we gave participants access to GPT-4 and DALL-E 3 and facilitated group sensemaking activities. We found our participants struggled to use these models due to various normative values embedded in their designs, such as hyper-positivity and anti-sexuality. We describe various strategies our participants developed to overcome these models' limitations and how, nevertheless, our participants found value in these highly-normative technologies. Drawing on queer feminist theory, we discuss implications for the conceptualization of "state-of-the-art" models and consider how FAccT researchers might support queer alternatives.

dall-e 3, participant, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2503.09805

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.15)
Asia > Middle East > Jordan (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Instructional Material (1.00)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Education (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

DebiasPI: Inference-time Debiasing by Prompt Iteration of a Text-to-Image Generative Model

Bonna, Sarah, Huang, Yu-Cheng, Novozhilova, Ekaterina, Paik, Sejin, Shan, Zhengyang, Feng, Michelle Yilin, Gao, Ge, Tayal, Yonish, Kulkarni, Rushil, Yu, Jialin, Divekar, Nupur, Ghadiyaram, Deepti, Wijaya, Derry, Betke, Margrit

arXiv.org Artificial IntelligenceJan-28-2025

Ethical intervention prompting has emerged as a tool to counter demographic biases of text-to-image generative AI models. Existing solutions either require to retrain the model or struggle to generate images that reflect desired distributions on gender and race. We propose an inference-time process called DebiasPI for Debiasing-by-Prompt-Iteration that provides prompt intervention by enabling the user to control the distributions of individuals' demographic attributes in image generation. DebiasPI keeps track of which attributes have been generated either by probing the internal state of the model or by using external attribute classifiers. Its control loop guides the text-to-image model to select not yet sufficiently represented attributes, With DebiasPI, we were able to create images with equal representations of race and gender that visualize challenging concepts of news headlines. We also experimented with the attributes age, body type, profession, and skin tone, and measured how attributes change when our intervention prompt targets the distribution of an unrelated attribute type. We found, for example, if the text-to-image model is asked to balance racial representation, gender representation improves but the skin tone becomes less diverse. Attempts to cover a wide range of skin colors with various intervention prompts showed that the model struggles to generate the palest skin tones. We conducted various ablation studies, in which we removed DebiasPI's attribute control, that reveal the model's propensity to generate young, male characters.

debiaspi, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.18642

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Media > News (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.69)

Add feedback